Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding quantiles aggregation #1937

Merged
merged 9 commits into from
Jul 19, 2022
Merged

Adding quantiles aggregation #1937

merged 9 commits into from
Jul 19, 2022

Conversation

brimoor
Copy link
Contributor

@brimoor brimoor commented Jul 12, 2022

Resolves #1935.

Adds a Quantiles aggregation that can be used to compute the quantiles of numeric data.

Notes

  • Unfortunately, MongoDB has no builtin method for computing quantiles, so the implementation performs a full sort of the values. Thus quantiles() requires O(n) memory (DB-side, not client-side), not O(1) like other aggregations
  • I benchmarked $group + JS sort versus $sort + $group on a dataset of 100K samples, and the JS sort version was faster
  • I also tried https://stackoverflow.com/a/60694525/16823653, which was extremely slow

Example usage

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

print(dataset.quantiles("uniqueness", [0.25, 0.50, 0.75]))
# [0.2202, 0.3377, 0.6255]

print(dataset.quantiles("uniqueness", 0.9))
# 0.6949

print(dataset.quantiles("predictions.detections.confidence", [0.25, 0.50, 0.75]))
# [0.0923, 0.2025, 0.5627]

print(dataset.quantiles("predictions.detections.confidence", 0.9))
# 0.9435

@brimoor brimoor added the feature Work on a feature request label Jul 12, 2022
@brimoor brimoor requested a review from a team July 12, 2022 15:21
@brimoor brimoor self-assigned this Jul 12, 2022
Copy link
Contributor

@benjaminpkane benjaminpkane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@brimoor brimoor merged commit 79fe485 into develop Jul 19, 2022
@brimoor brimoor deleted the feature/quantiles branch July 19, 2022 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Work on a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Add percentile (quantile) aggregation
2 participants